Dynamic potential-based reward shaping
نویسندگان
چکیده
Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multiagent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together. However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, especially if the reward-shaping function is generated automatically. In this paper we prove and demonstrate a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case.
منابع مشابه
Expressing Arbitrary Reward Functions as Potential-Based Advice
Effectively incorporating external advice is an important problem in reinforcement learning, especially as it moves into the real world. Potential-based reward shaping is a way to provide the agent with a specific form of additional reward, with the guarantee of policy invariance. In this work we give a novel way to incorporate an arbitrary reward function with the same guarantee, by implicitly...
متن کاملPotential-based difference rewards for multiagent reinforcement learning
Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...
متن کاملPlan-based reward shaping for multi-agent reinforcement learning
Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learnin...
متن کاملMulti-agent, reward shaping for RoboCup KeepAway
This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory [2], potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of statebased and state-action-based reward shaping in...
متن کاملPotential-Based Shaping and Q-Value Initialization are Equivalent
Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for seve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012